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Long noncoding RNAs (IncRNAs) play a key role in many 
important areas of epigenetics, stem cell biology, cancer, 
signaling and brain function. This emerging class of RNAs 
constitutes a large fraction of the transcriptome, with 
thousands of new IncRNAs reported each year. The molecular 
mechanisms of these RNAs are not well understood. Currently, 
very little structural data exist. We review the available 
IncRNA sequence and secondary structure data. Since 
almost no tertiary information is available for IncRNAs, we 
review crystallographic structures for other RNA systems and 
discuss the possibilities for IncRNAs in the context of existing 
constraints. 



Introduction 

A new class of RNAs has recently emerged as a key player in 
many rapidly growing areas of research, including epigenetics, 
hormone signaling, development, stem cell biology, cancer, brain 
function and plant biology. 1 " 9 The growth of this area has been 
fueled by recent advances in sequencing technology. These RNAs 
(long non-coding RNAs, or IncRNAs) are typically 1,000- 
10,000 residues in length. LncRNAs are often polyadenylated, 
transcribed by RNA polymerase II and spliced. 3,6,7,10 " 13 While 
some IncRNAs are found in the cytoplasm, most are localized in 
the nucleus. Many IncRNAs are associated with histone methyla- 
tion, chromatin remodeling and subsequent epigenetic effects. 14 
In the field of epigenetics, the mechanism by which epigenetic 
factors find their targets remains largely a mystery. The impor- 
tance of IncRNAs has been underscored in the context of mam- 
malian genomes, where recent evidence suggests that IncRNAs 
may provide a missing epigenetic link between DNA, histones 
and methylation factors. 15 

In humans, over 70% of the genome is actively transcribed. 16 
In contrast, protein-coding genes constitute only 1-2% of 
the genome. 17 The active transcription of non-protein coding 
genes gives rise mainly (80—90%) to IncRNAs. 18 While some 
IncRNAs, such as MALATl, are highly abundant transcripts, 
many IncRNAs do show low count. However, low transcription 
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levels do not necessarily reflect lack of functionality. Studies on 
the stability of IncRNAs 1 '- 1 have shown that IncRNAs have stabili- 
ties comparable to those of mRNAs (albeit slightly less on aver- 
age). Here, time scales range from 30 min up to 16 h in the case 
of MALATl. Protein half-lives range from 30 min to 2 h. 20 We 
note that transcription rates range from 1—50 kb/min. 21 From 
these data, a picture of the nucleus emerges, where IncRNAs are 
synthesized in minutes and may persist for hours. 

Long noncoding RNAs (IncRNAs) are very broadly defined 
by two major characteristics: (1) length of the transcript (> 200 
nts) and (2) having little or no potential for translation. 22 Some 
IncRNAs ('macroRNAs') achieve incredible lengths, extend- 
ing beyond 90 kB. Examples include the 108 kB Air and the 
91 kB kcnqlotl. 23 " 25 The term IncRNA is traditionally reserved 
for regulatory RNAs. LncRNAs are often further divided into 
categories based on their relative position to neighboring pro- 
tein-coding genes. Natural antisense transcripts (NATs) are 
transcribed from the antisense strand of protein-coding genes, 
overlapping at least one exon. Large intervening noncoding 
RNAs, as known as long intergenic noncoding RNAs (lin- 
cRNAs), are positioned far from protein-coding genes. Intronic 
noncoding RNAs are uniquely transcribed from intron regions 
of protein-coding genes either in the sense or antisense direction. 
Bidirectional IncRNAs are transcribed in the antisense direc- 
tion in the region of the promoter of a protein-coding gene. An 
exact estimation of the number of IncRNAs is quite challeng- 
ing due to their cell-specific, tissue-specific, developmental stage 
specific and disease-specific expression profiles. The most recent 
estimates place the number of IncRNAs in humans at '15, 000. 26 
However, tens of thousands of IncRNAs have been profiled this 
year alone. 27 " 30 

In terms of function, nascent paradigms of IncRNA action 
include, but are not limited to, critical regulatory roles in embry- 
onic stem cell pluripotency, 31 brain function, 32 " 34 subcellular 
compartmentalization 35,36 and chromatin remodeling. 3,8,37 Many 
have been linked to various diseases, such as cancer. 38 We note 
that IncRNAs play key roles in intracellular and extracellular 
signaling (SRA, Gas5, LINoCR, BC1, BORG and NRON) and 
stress response (e.g., SAT III, PRINS, npc536, hsr omega tran- 
script, gadd7, Hsrl and bacelas). More detailed discussion of 
functional aspects of IncRNAs can be found in several excellent 
recent review articles. 39 Due to the large sizes of intact IncRNAs 
relative to typical biophysical systems, very few structural studies 
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of these RNAs have been performed. 40 By comparison, the high- 
resolution 3-D structure of the intact ribosome (~5 kB in total) 
required -25 y for its solution. 41 ' 42 

For IncRNAs, the following questions remain unanswered: 
(1) Are IncRNAs highly structured or disordered? (2) Do they 
contain globular sub-domains, or are they organized linearly in 
chains of stem-loops? (3) Do IncRNAs exist in ribonucleoprotein 
complexes or as isolated RNAs that transiently interact with pro- 
teins? (4) Do these molecules contain a compact core, or are they 
more extended? 

Mechanistic studies of IncRNAs have the potential to be more 
challenging than ribosomes, because IncRNAs are not as highly 
conserved nor as highly expressed. Nonetheless, RNA molecules 
are well known to utilize a wide spectrum of functional elements 
either at their sequence, secondary or tertiary level. RNA interfer- 
ence and RNA silencing leverage sequence specificity to control 
gene expression. Riboswitch RNAs regulate gene expression via 
secondary structure. The ribosome uses its complex tertiary struc- 
ture to synthesize proteins, in a manner analogous to a protein- 
based molecular machine. LncRNAs may or may not use aspects 
of each of these three mechanisms to regulate gene expression. In 
light of the tremendous variety of IncRNAs, it is possible that all 
three of these mechanisms are employed by IncRNA systems. A 
great deal of useful information can be produced using modern 
structural biology techniques. In this review, we provide a sum- 
mary of current knowledge of sequence and structural features 
of eukaryotic IncRNAs. Although studies of IncRNA tertiary 
structure have yet to be performed, we examine known crystal- 
lography structures of other RNAs and explore the possibilities 
that might occur in IncRNA systems. 

Sequence Elements in IncRNAs 

Some IncRNAs rely on Watson-Crick base pairing for functional 
activity. This may be in the form of 'perfect' pairing, where a stretch 
of the IncRNA forms a continuous sequence of Watson-Crick base 
pairs with another RNA molecule, such as an mRNA. There are 
also IncRNAs that implement 'imperfect' pairing, where a stretch 
of Watson-Crick pairs is interspersed with non-Watson-Crick 
base pairs. Finally, evidence exists for regions of IncRNAs directly 
interacting with DNA. In some cases, it has been suggested that 
base pairing between the RNA and DNA occurs, while in other 
cases, triple helix mechanisms have been proposed. 43 

miRNA-sequestering IncRNAs. These IncRNAs provide 
alternative miRNA binding sites to regulate expression levels of 
protein-coding genes post-transcriptionally. Linc-MDl, involved 
in muscle differentiation, acts as a competing endogenous RNA 
(ceRNA), sequestering miR-133 and miR-135 from their target 

44 

genes. 

Half-STAUl-binding site RNAs (1/2-sbsRNAs). This 
IncRNA binds to 3'-UTRs via Alu elements in a process known 
as STAUl-mediated mRNA decay. 7 This event involves imper- 
fect base pairing between the Alu element of the IncRNA and 
the Alu element of the mRNA. The interaction is recognized by 
the dsRNA-binding protein STAU1 and results in degradation 
of the mRNA. 



Antisense IncRNAs. These IncRNAs may bind to mRNAs, 
regulating their splicing. The long noncoding Zeb2NAT tran- 
script originates antisense to the 5' splice site of mRNA Zeb2. 
The transcript is known to prevent splicing of this mRNA 
region, preserving the internal ribosome entry site for efficient 
translation. 45 

Upstream IncRNAs. These IncRNAs may form triplex com- 
plexes with DNA promoter regions. One such IncRNA originates 
upstream of the dihydrofolate reductase gene (DHFR) in humans. 
Here, upstream transcription is initiated at the alternative minor 
promoter site, resulting in the decreased occupancy of transcrip- 
tion factors at the major promoter and subsequent repression of 
gene expression. 46 Moreover, the noncoding RNA product that 
originated upstream was found to interact directly with a major 
DNA promoter site, forming a purine-purine-pyrimidine triplex. 

Secondary Elements of IncRNA Structure 

In addition to sequence, RNA secondary and tertiary structural 
motifs often play a central role in the mode of action of RNA, be 
it specific binding, allosteric, catalytic or structural. While few 
structural studies of IncRNAs exist, we describe below the hints 
of structure that have been uncovered to date. 

Double stem loops in chromatin remodeling. Many 
IncRNAs have been shown to play an important role in chroma- 
tin remodeling. Large-scale identification of functional IncRNAs 
has resulted from their association with chromatin proteins. 13,47 
For example, a CLIP-seq investigation of RNA associated with 
the SFRS1 splicing factor uncovered > 6,000 spliced noncod- 
ing RNAs with unknown function. While the full functional 
repertoire of IncRNAs remains to be delineated, 48 it is clear that 
IncRNAs play a critical role in chromatin remodeling, often act- 
ing in trans via association with chromatin modifying enzymes. 47 
Lee and coworkers identified > 9,000 IncRNAs in mouse embry- 
onic stem cells, which interact with the polycomb repressive com- 
plex, PRC2. 49 EMSA analysis of a number of PRC2 -interacting 
IncRNAs suggested that binding occurs through EZH2, one 
of four polycomb proteins domains of PRC2. The remaining 
three domain proteins of PRC2 are thought to further tighten 
this interaction. The large number of identified IncRNAs, 
including RepA/Xist, HOTAIR and Air, suggest the presence 
of certain common features across the PRC2 -interacting fam- 
ily of IncRNAs. In addition, IncRNAs can also associate with 
the LSDl/CoREST/REST complex, critical in H3K4 demeth- 
ylation. The lincRNA HOTAIR is an excellent example of this 
bifunctionality. 3 

HOTAIR is -2.2kB in length and regulates the gene expres- 
sion of HoxD genes and a number of other genes by recruiting 
the LSD1 and PRC2 histone modification complexes to targeted 
loci. Deletion experiments on HOTAIR narrowed the interaction 
sites down to two modular regions which are responsible for these 
interactions: (1) a 5' 300 nt region that binds PRC2, and (2) a 
646 nt region located downstream at the 3'-end, responsible for 
binding to LSD1. This raises the question: while two key motifs 
are located at the 5' and 3'-ends of HOTAIR, does the remain- 
der of the sequence have functional importance? The intervening 
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sequence may provide the required distance in terms of spatial 
organization between two interaction sites. Alternatively, it may 
contain motifs necessary for targeting, or comprise additional 
protein binding motifs required for activity, yet to be found. It 
has been suggested that a double stem-loop RNA motif is present 
in a PRC2-binding region. Binding motifs for LSD1 have not 
been identified. 49,50 

Similar to HOTAIR, an A-repeat subregion of Xist, known 
to bind PRC2, has long been thought to form multiple double 
stem-loop structures, encoded by the repeat sequences located 
in this region. 51 Recent detailed chemical probing investigations 
were not consistent with the proposed arrangements of second- 
ary structure elements and suggested a more complex secondary 
fold, comprising elongated helical subdomains. 52 Until an addi- 
tional number of detailed structural/mechanistic investigations 
for PRC2-interacting IncRNAs are performed, the critical RNA 
motifs for association with PRC2 will not be clear. 

Cloverleaf elements in 3'end processing and in brain evolu- 
tion. A cloverleaf secondary architecture similar to that found in 
tRNA has been found in many different regions of long noncod- 
ing RNAs. One of its roles has been assigned to 3 -end maturation 
of the IncRNAs transcripts involved in subcellular organization. 53 
These include the MALATl and NEAT1 IncRNAs, involved in 
the formation of nuclear speckles and paraspeckles, respectively. 
Non-canonical end maturation of the MALATl ncRNA involves 
a cloverleaf secondary element at its 3'-end. 35 This subregion is 
the most conserved element in both the MALATl and NEATl 
sequences, adopting tRNA-like architecture. This structural ele- 
ment is responsible for recruiting RNase P (involved in matura- 
tion of tRNA molecules) for cleavage and generation of mature 
MALATl transcripts. The remaining cleaved fragment is further 
processed by RNase Z/tRNA nucleotidyl transferase to yield a 
tRNA-like transcript (mascRNA), which is further shuttled to 
the cytoplasm. Interestingly, the mature 3'-end of MALATl 
sequence comprises only a relatively short and genomically- 
encoded stretch of poly(A) region, suggested to be stabilized by 
two conserved U-rich motifs, located upstream. The details of 
this interaction are currently unknown. The same mechanism of 
3'- end processing has been determined for the NEATl_v2 tran- 
script, generating small and independent tRNA-like molecule as 
well, named menRNA. 54 

Interestingly, cloverleaf architecture has been found in the 
subregion of another IncRNA, HAR1 (human accelerated 
region), associated with neocortex development. 55 This region 
covers 118 nucleotides, which are highly conserved across ver- 
tebrates (2 nt change between chicken and chimpanzee), but 
more divergent in humans (18 mutations relative to chimpan- 
zee). 56,57 Rapid changes in sequence homology between chim- 
panzee and human have been associated with human brain 
evolution. In vitro structure probing experiments on human 
(hHAR) and chimpanzee HAR (cHAR) regions showed dis- 
tinct secondary folds. 55 In chimpanzee, cHAR adopts a rather 
unstable and extended hairpin architecture; in humans, hHAR 
folds into a cloverleaf-type element, consisting of a 4-way junc- 
tion. The authors have mentioned that the chimpanzee sequence 
could possibly adopt a cloverleaf architecture, but may require 



a stabilizing factor, the necessity of which is likely to be dimin- 
ished in human HAR. 

Secondary structures in steroid receptor chemistry. We have 
recently performed extensive chemical and enzymatic investiga- 
tions of another long noncoding RNA, the steroid receptor RNA 
activator or, SRA. 58 This study produced the first experimentally 
determined secondary structure of an intact human IncRNA to 
our knowledge (Fig. 1). SRA co-activates several steroid recep- 
tors (e.g., ER, AR, TR, GR, RAR), and it is known to interact 
directly with many proteins (e.g., SHARP, SLIRP, DAX-1, TR), 
suggesting that it may play a scaffolding role in the transcription 
complex. SRA has also been shown to interact with CTCF. 59 This 
RNA was one of the first IncRNAs discovered in humans. Our 
biochemical probing revealed a complex 2D architecture, com- 
prising four major subdomain regions. The identified secondary 
elements range from small modular helical regions to complex 
multiway junctions. In total, SRA contains 25 helical segments, 
16 terminal loops, 15 internal loops and 5 junction regions. We 
have also noticed that purine-rich sequences are highly conserved 
and often located in single-stranded regions such as terminal, 
internal and junction loops. The same trend in structural prefer- 
ence is generally observed for rRNA. The vast majority of helices 
in our structure were validated by covariance analysis using mul- 
tiple sequence alignment across vertebrates. 

Previously Studied Tertiary Structures 
of Other RNA Systems 

Three-dimensional structures of IncRNAs have not been 
attempted to date. Here, we review previously solved structures of 
other RNAs and discuss these in the context of IncRNAs. Prior 
to 2000, the set of three-dimensional RNA structures roughly 
consisted of tRNA, various ribozyme and aptamer RNAs, the 
group I and group II introns, RNA helices and quadruplexes, 
portions of the bacterial ribosome and components of the spli- 
ceosome. The initial high-resolution structures of the ribosome 
published in 2000 spurred a large number of additional RNA 
crystallographic studies. These include many different riboswitch 
RNAs, TLS RNA, RNase P, the signal recognition particle, the 
HIV-1 frame-shifting element, regions of telomerase RNA, as 
well as many other ribosome constructs. 60 The ribosome and 
introns remain the only large RNA (> 200 nts) high-resolution 
crystallographic structures solved to date. 

The group I and group II introns are isolated compact RNAs 
characterized by numerous RNA helices capped with RNA stem- 
loops (Fig. 2A). 61,62 The helices are connected through various 
junctions. Tertiary contacts between helices, loops and junctions 
also exist. 

An important component of the telomerase complex is 
telomerase RNA, which is directly bound to telomerase reverse 
transcriptase and acts as template for nucleotide additions of telo- 
meric regions (Fig. 2B). 63 Human telomerase RNA is -450 nts. 
In yeast, it was shown that the estlp binding domain on the RNA 
can be moved around to other locations within the RNA while 
maintaining function, suggesting regions of telomerase RNA are 
highly flexible. 64 
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Figure 1 (See opposite page). The first experimentally determined secondary structure of an intact IncRNA, to our knowledge. The steroid receptor 
RNA activator (SRA) IncRNA contains 4 subdomains, and 25 helices. The structure was determined using four methods of chemical (SHAPE, in-line, 
DMS) or enzymatic (RNase V1) probing. Covariance analysis based on multiple sequence alignment across vertebrates was used to help validate the 
structure. In SHAPE probing (selective 2'-hydroxyl acylation analyzed by primer extension), high reactivity corresponds to high mobility and low likeli- 
hood for base pairing; low reactivity corresponds to low mobility and high likelihood for base pairing. Orange, high SHAPE reactivity. Yellow, medium 
reactivity. Grey, low reactivity. Black, no reactivity. Insets: Red, SHAPE reactivity capillary electrophoresis trace for IncRNA. Green, raw blank trace. 



Figure 2. Examples of RNA tertiary structures solved by crystallogra- 
phy. (A) Group II intron, solved by Pyle, Toor and coworkers. 62 The intron 
is a highly-structured isolated RNA with compact core. (B) Telomerase 
RNA solved by Skordalakes and coworkers. 63 (C) RNase P solved by 
Mondragon and coworkers. 65 RNase Pisa highly structured RNA with 
a single protein binding domain. (D) Ribosome, Ramakrishnan and co- 
workers. 66 The ribosome is a highly structured and highly compact RNA 
complex containing -50 proteins that help stabilize the RNA structure. 
The ribosome contains a limited number of factor binding sites. Differ- 
ent factors bind to the same binding sites, regulating protein synthesis. 



RNase P is a ribonuclease responsible for cleaving a precursor 
sequence from tRNA. The structure is dominated by the RNA 
(Fig. 2C). 65 A small protein component increases the affinity of 
tRNA to RNase P. This RNA is highly structured and compact, 
similar to the group I and group II introns. 

The ribosome is the universally conserved molecular machine 
responsible for protein synthesis (Fig. 2D). 66 In bacteria, the 
ribosome consists of two subunits. The small subunit (30S) con- 
tains a -1.5 kB RNA (16S rRNA) and -20 different proteins. The 
large subunit (50S) contains a -3 kB RNA (23S rRNA), a -120 
nt RNA (5S rRNA) and -35 different proteins. The two subunits 
fit together, producing a large cavity between the two, through 
which the tRNAs enter and exit. Most of the protein factors bind 
to the ribosome at the GTP-associated center or at one of the 
three tRNA binding sites. For example, both elongation factors 
EF-Tu and EF-G bind to the same location on the ribosome at 
different stages of the elongation cycle. The scaffold of ribosome 
structure is RNA, while many proteins interspersed throughout 
the ribosome, providing structural stability to the overall archi- 
tecture. Functional factors, which help the ribosome proceed 
through protein synthesis in a GTP-dependent manner, come on 
and off the ribosome at various stages of protein synthesis. 

Macromolecular Complexes and IncRNA 
Quaternary Interactions 

So far, the main evidence for IncRNA participation in quaternary 
complex formation has been obtained for NEATl transcripts. 
Here, two NEATl isoforms (NEAT_V1: 3.7kB and NEAT_ 
V2: 22.7 kB for human) are expressed from the same promoter 
with similar expression levels. 67 Both isoforms are involved in 
the formation of specific nuclear compartments called para- 
speckles. These are ribonucleoprotein complexes characterized 
by three distinct proteins (e.g., p54, PSF and PSP1), which all 
contain RNA-binding motifs. 68,69 The proposed model of para- 
speckle association relies on the initial NEAT_V2 association 
with PSF and p54, followed by subsequent recruitment of PSP1 
and NEAT1_V1. 67 Depletion of either NEAT_V2, p54 or PSF1 
results in paraspeckle disintegration; however, depletion of PSP1 




g Telomerase RNA 




did not affect the architecture. Based on immuno-hybridization 
and in situ hybridization electron microscopy studies, paraspeckle 
association requires association of multiple NEATl transcripts, 
creating a fiber-like network. 70 Interestingly, using DNA probes 
specific for subregions of NEATl, the 5' and 3' ends of NEATl 
were localized to the periphery of paraspeckles, while the central 
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regions of longer NEATl (NEATl_v2) transcripts were localized 
to the inner part of these bodies. The NEATl arrangements in 
the inner region appear to be distributed uniformly. The expres- 
sion of a shorter isoform of NEATl (NEATl_vl), lacking 3'end, 
cannot rescue the paraspeckle formation. The following observa- 
tions suggest the presence of certain functional elements in RNA 
molecule. Paraspeckles are also formed with a relatively fixed 
diameter, but vary in their length. Mouse NEATl transcripts 
(shorter by 2 kB compared with human) have a 9% diameter 
reduction relative to human, leading the authors to conclude that 
the length of NEATl transcript might be also the limiting factor 
in this arrangement. However, it remains unclear how the final 
NEATl arrangement in paraspeckles is accomplished. There 
are number of possibilities here. This can be the direct result 
of RNA-RNA interactions between NEATl transcripts to cre- 
ate a complex macromolecular platform. Alternatively, proteins 
can serve as bridges to connect multiple NEAT1_V2 transcripts, 
especially, knowing the fact that p54 is able to form heterodimers 
with PSF and PSP1 proteins. 68 ' 71 

Perspectives on IncRNA Structure and Mechanism 

IncRNAs are not likely to exist in ribosome-like ribonucleo- 
protein complexes. As the ribosome is the only RNA structure 
> 1 kB solved to date, we ask the question, are IncRNA systems 
similar to ribosomes in their structural composition? We note 
that our recent structural study of the SRA IncRNA revealed 
RNA secondary structure similar to the ribosome in its overall 
architecture. This study, which used multiple forms of extensive 
chemical probing combined with multiple sequence analysis, 
demonstrated that SRA has 4 sub-domains with numbers of heli- 
ces and loops comparable (in proportional terms) to a ribosome 
subunit. We currently have no information on the tertiary struc- 
ture of this IncRNA or other IncRNAs. In addition, we do not 
know if IncRNAs exist in ribonucleoprotein complexes (RNPs) 
or as isolated RNAs. 

To compare with the ribosome, we note that it consists of a 
few long RNAs complexed with many unique (i.e., non-identi- 
cal) proteins. The total number of protein-coding genes in the 
human genome is estimated to be -21, 000. 72 As most proteins 
reside in the cytoplasm, we can reasonably estimate that the 
number of proteins in the nucleus, N , < 21,000. Many 

IT ' protein, nucl / 

thousands of IncRNAs have been identified, with most residing 
in the nucleus. As a very conservative estimate, we use N [ncRNA 
, > 3,000, giving N, nM , ,/N , > 1/7. We note that 

nucl D D IncKNA.nucl protein, nucl 

many IncRNAs are as large as or larger than the ribosome. In the 
case of the ribosome itself, the ratio of RNA molecules to pro- 
tein molecules is N D .,./N -1/25 for each subunit. Thus, even 

rRNA rp 

if every single unique protein encoded in the human genome 
formed a complex with a IncRNA, we would still not expect 
IncRNAs to be similar in structural composition to ribosomes. 
There are not enough unique proteins to form ribosome-like 
complexes (with -25 unique proteins) for each IncRNA. Using 
an optimistic estimate that 1 in 10 of all proteins binds to a 
IncRNA, this would still provide less than 1 unique protein per 
IncRNA. Therefore, we conclude that IncRNAs in the nucleus 



are not likely to exist in ribosome-like RNP complexes. A few 
IncRNAs could theoretically exist in ribosome-like complexes. 
These complexes would be more likely to exist in the cytoplasm. 
The following possibilities remain: (1) IncRNAs exist in RNP 
complexes with many repeats of a few proteins, (2) IncRNAs 
exist in complexes with only a few proteins or (3) IncRNAs exist 
as isolated RNAs that transiently bind proteins as needed for 
function. We note that in the case of (1), to produce a similar 
protein density (number of proteins per RNA) to ribosomes, we 
would require -10 proteins bound per 1 kB of IncRNA (e.g., 90 
identical proteins bound to MALAT-1). While (1) is certainly 
possible, we favor (2) or (3). 

The large diversity of IncRNAs may produce complexes sim- 
ilar in overall form to telomerase RNA, RNase P or the group 
I and II introns. While we suspect IncRNA complexes are not 
similar to ribosomes, we cannot rule out similarity to RNase P, 
telomerase RNA or the group I and group II introns. In the case 
of an 'RNase P-like' complex, the IncRNA would be highly struc- 
tured and compact, containing a main protein binding site, where 
various proteins bind (Fig. 3A). Alternatively, the IncRNA could 
be decentralized without a compact core. It may contain several 
distinct protein-binding sites and act as flexible structural tether, 
as suggested for the telomerase RNA (Fig. 3B). 64 The IncRNA 
could also be a stand alone, highly structured RNA, similar to 
the group I and group II introns. In this case, the IncRNA may 
transiently bind proteins as needed. Finally, another possibility is 
a highly disordered RNA, containing loosely organized protein 
binding domains (Fig. 3C). Our experimentally determined sec- 
ondary structure of SRA is highly organized and more suggestive 
of a structure with characteristics of Figure 3A. We enumerate 
various combinations of secondary structure and tertiary struc- 
ture in Table 1, with column 1 corresponding to Figure 3A, col- 
umn 6 corresponding to Figure 3B, and columns 6-7 possibly 
corresponding to Figure 3C. 

Possibilities for structure-based mechanisms of IncRNAs. 
Although many more proteins have been studied in mechanistic 
detail relative to RNAs, a diverse portfolio of RNA mechanisms 
has emerged, based on either sequence, secondary or tertiary 
organization of RNA molecules, as well as combinations of these 
mechanisms. In sequence-based mechanisms, such as RNA inter- 
ference by siRNAs and RNA silencing by miRNAs, the RNA 
plays a very minor structural role. Here, the role of RNA is 
mainly to add sequence specificity to the process, allowing the 
RISC complex to find its target and trigger a largely protein- 
based regulation mechanism. 73,74 

Over the past decade, a new mechanism of regulation has 
emerged, which is almost entirely based on RNA secondary 
structure. 75 " 80 In riboswitch RNA systems, two secondary struc- 
tures compete with each other to control termination of tran- 
scription (some riboswitch RNAs also control translation by 
sequestering the start codon). Here, one sequence in the 5 -UTR 
of the mRNA codes for two different secondary structures. 
The presence or absence of a metabolite selects one of the two 
structures, switching gene expression on or off. For example, in 
the case of the SAM-I riboswitch, the presence of a metabolite 
(SAM) causes the RNA to fold into a compact aptamer, favoring 
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Table 1. Possibilities for structural configurations of IncRNAs 
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5 6 7 8 


Core secondary structure? 


+ 
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+ 


+ 


Binding domain secondary 
structure? 


+ 


+ 


+ 




+ + 


Core tertiary structure? 


+ 


+ 




+ 




Binding domain tertiary 
structure? 


+ 








+ 



Columns 1-8 represent different RNA structural configurations. Column 
1 represents a highly structured configuration similar to the ribosome. 
Column 8 represents an unstructured RNA. Columns 2-7 represent 
various intermediate cases. Columns 6-8 represent a decentralized 
structural configuration. Our recent study demonstrates that the entire 
SRA IncRNA has well-organized secondary structure, corresponding to 
columns 1-3, depending on the degree of tertiary structure. All cases 
may also include single-stranded regions that become organized upon 
protein binding. 

formation of the transcriptional terminator helix, turning gene 
expression of SAM synthetase off. In the absence of the metabo- 
lite, a second, alternative helix is formed, preventing formation 
of the transcriptional terminator helix, turning gene expression 
on. Riboswitches are more "secondary structure specific" than 
"sequence specific", often mandating stochastic context-free 
grammar algorithms for searches, as opposed more conventional 
BLAST-like searches. Interestingly, artificial riboswitch-like sys- 
tems were first designed and produced in the lab and only later 
discovered in bacteria. 81 

RNA mechanisms based on tertiary structure are often allo- 
steric and may be described by 'induced-fit' or 'conformational 
selection'. 78,82,83 In induced-fit, an event, such as protein- or 
ligand-binding, triggers a large conformational change. In con- 
formational selection, the system is often frustrated between 
two conformations. A protein- or ligand-binding event shifts 
the equilibrium to one of the two conformations. In the case of 
the ribosome, many conformational fluctuations occur simul- 
taneously and at different time scales. Protein binding or GTP 
hydrolysis events act to synchronize the fluctuations, shifting the 
equilibrium to the next basin in the energy landscape, allowing 
the ribosome to progress through the elongation cycle. 84 " 87 

Time scales and order of events. In addition to the three- 
dimensional structure of IncRNAs, the order of events and kinet- 
ics of these systems is essential for mechanistic understanding. 
For example, crystallographic structures have been solved for 
many ribosome complexes; however, the mechanism of ribo- 
some translocation is still not understood. Rapid kinetics stud- 
ies 88,8 ' define the overall order of events. Single-molecule studies 
help elucidate the mechanism for transitions between states. 90 A 
fusion of structural and kinetic information is required to unlock 
mechanism. Interestingly, the overall order of events can often 
be obtained before high-resolution crystallographic structures are 
available. 

To illustrate potential time scales involved in IncRNA mecha- 
nism, we consider the IncRNA DBE-T, a key component of the 
epigenetic switch associated with Facioscapulohumeral muscu- 
lar dystrophy (FSHD). 37 This IncRNA is a cis-acting tether that 



A Compact core 




B De-centralized scaffold 




0 Loosely organized protein 
binding domain 




recruits the epigenetic factors D4Z4 and AshlL to the D4Z4 
binding element (DBE) on chromatin, driving histone methyla- 
tion and 4q35 gene transcription. One possible order of events 
may be (Fig. 4): (A) transcription, (B) IncRNA folding, (C) epi- 
genetic protein binding to the IncRNA, (D) epigenetic protein 
binding to the chromatin, and (F) action of the epigenetic protein 
(e.g., histone methylation). Each of these steps will have its own 
time scale. Identifying the rate-limiting step will yield significant 
insight into the mechanism of IncRNA action. Other scenarios 
are also possible, involving different orders of events and differ- 
ent combinations of steps. For other classes of IncRNAs, entirely 
different events may occur. 



Figure 3. Possibilities for IncRNA three-dimensional architecture. These 
homology models represent concepts for possible IncRNA 3D struc- 
tures. (A) IncRNA (pink) contains a compact tertiary core. The IncRNA 
may have a main protein (green) binding site, responsible for binding 
various protein factors. (B) De-centralized scaffold. In this scenario, the 
IncRNA does not have a compact core. The IncRNA may have several 
protein (yellow) binding sites. (C) Loosely organized protein binding 
domain with regions of unstructured RNA. The IncRNA may contain 
several long stretches of disordered single stranded RNA. 



www.landesbioscience.com 



BioArchitecture 



195 



DNA/chromatin 

A 



^transcription, 



r, 



ts 



B 




Alternative orders of events 



A 
B 



r-p 



LncRNA 

\> folding, r f 




C 
E 

F 



A 

B , C 

1 r r-p - r p-d 

D , E 

F 



^ IncRNA binds protein, r r _ 




epigenetic 
factor 



^ protein binds to chromatin, r p . d 





Methylation, r Me 



epigenetic 
mark 



CD 

-4— » 

_Q 



CO 

'id 

•4—1 

o 
c 

o 
Q 

CD 
O 

c 

CD 

"o 

CO 

o 
in 

CO 
CD 
"D 

c 

CO 
I 

CM 

O 
CN 

© 



Figure 4. For figure legend, see page 197. 
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Figure 4 (See opposite page). An example of potentially relevant time scales for IncRNA activity: DBE-T IncRNA. DBE-T is a cis-acting tether than 
recruits epigenetic factors to chromatin. Steps A-F each have an associated timescale, t. (A) Initial state. (B) Transcription of DBE-T. (C) DBE-T folding. 
(D) The epigenetic factor binds to the IncRNA. (E) The epigenetic factor binds to chromatin. (F) The epigenetic factor marks the chromatin (e.g., his- 
tone methylation). 



Conclusions 

The structural biology of IncRNAs presents a brave new RNA 
world, where many fundamental questions have not been 
addressed. With the thousands of new IncRNAs recently dis- 
covered in disparate areas of biology, it is likely that a zoo of 
distinct structural architectures and structural mechanisms will 
be revealed. These may be sequence-based, secondary struc- 
ture based, tertiary structure based, or some unusual combina- 
tion of these. The diverse range of IncRNA structures will be 
accompanied by a corresponding array of kinetic mechanisms. 



As RNA molecules are notoriously difficult to crystallize, it 
may be useful to first apply alternative strategies to gain three- 
dimensional information about IncRNAs. Ultimately, the iden- 
tification of common structural features and structure/function 
relationships will help us understand the role of IncRNAs in 
development and disease. As with many established therapeutic 
strategies, mechanistic understanding will help lay the founda- 
tion for development of IncRNA-based therapy. 
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